Efficient Floating-point Based Block LU Decomposition on FPGAs

نویسندگان

  • Gokul Govindu
  • Viktor K. Prasanna
  • Vikash Daga
  • Sridhar Gangadharpalli
  • V. Sridhar
چکیده

In this paper, we propose an architecture for floatingpoint based LU decomposition for large-sized matrices. Our proposed architecture is based on the well known concept of blocking and uses pipelined floating-point units to obtain high throughput. We first analyze the effects of block size and the deeply pipelined floating-point units on the performance of the architecture. We analyze and compare the performance of our double-precision based design with that of a GPP based design. Initial results show that an improvement of upto 23x in the total computation time can be achieved. We then, analyze the impact of algorithm level design (by varying block size) on the system-wide energy dissipation and resource-usage of our designs. Categories: 1. Theory, Mapping and Parallelization and 4. Applications

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Multi-FPGA based High Performance LU Decomposition

LU Decomposition is a linear algebra routine that is used to bring down the complexity of solving a system of linear equations with multiple RHS. Its application can be found in computational physics (modeling 2-D structures), image processing, and computational chemistry (design and analysis of molecular structures). This paper investigates the hardware software co-design of large scale block-...

متن کامل

Perspectives for the Use of Field Programmable Gate Arrays for Finite Element Computations

We have studied how the solution of partial differential equations by means of finite element methods could be accelerated using Field Programmable Gate Arrays (FPGAs). First, we discuss in general the capabilities of current FPGA technology for floating-point implementations of number crunching. Based on practical results for basic floating-point operators performance limits are outlined. Then...

متن کامل

Implementation of LU Decomposition and QR Decomposition on Parallel Processing Systems

One of the earliest attempts to implement LU Decomposition with special purpose hardware was using systolic/wavefront arrays[2]. Different proposals for the processing elements(PEs) of systolic/wavefront arrays are provided[3][4][5]. These ideas were not implemented in circuit at that time. The performance of these architectures were not quantitatively evaluated either. In 1994, E. Casseau[6] i...

متن کامل

Variable Precision Floating-Point Divide and Square Root for Efficient FPGA Implementation of Image and Signal Processing Algorithms

Field Programmable Gate Arrays (FPGAs) are frequently used to accelerate signal and image processing algorithms due to their flexibility, relatively low cost, high performance and fast time to market. For those applications where the data has large dynamic range, floating-point arithmetic is desirable due to the inherent limitations of fixed-point arithmetic. Moreover, optimal reconfigurable ha...

متن کامل

Exploiting mixed-mode parallelism for matrix operations on the HERA architecture through reconfiguration

Recent advances in multi-million-gate platform FPGAs have made it possible to design and implement complex parallel systems on a programmable chip (PSOPCs) that also incorporate hardware floating-point units (FPUs). These options take advantage of resource reconfiguration. In contrast to the majority of the FPGA community that still employs reconfigurable logic to develop algorithm-specific cir...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004